Abstract
Myelofibrosis (MF) is a type of chronic blood cancer characterized by bone marrow fibrosis, extramedullary hematopoiesis and splenomegaly. Approximately 89% of patients present palpable splenomegaly with a compromised quality of life and reduced survival. The International Working Group Myeloproliferative Neoplasm Research and Treatment (IWG-MRT) criteria utilize spleen volume (SV) as part of clinical improvement (CI) response in MF trials. These include evaluation of spleen response as a primary/ secondary endpoint - defined as ≥35% SV reduction from baseline. Progressive disease is defined as ≥25% increase in SV from baseline level. Magnetic resonance imaging (MRI) and Computed tomography (CT) provide a non-invasive way to assess change in spleen size both spatially and temporally in a clinical study. While image acquisition with optimized and harmonized protocols is key, a central independent review of images also plays a critical role in correct patient outcome determination. The aim of this study is to determine if a double read model for central independent review is necessary to maintain a high accuracy of SV estimation. To this effect, alignment among independent readers over review criteria was assessed: inter-reader variability (IRV), in addition to assessment of consistency in review approach: intra-reader variability (ARV).
Retrospective analysis was implemented on imaging data across 12 multi-center MF trials-MRI/CT images of two time-points (baseline and 1 follow-up) from 142 trial participants for ARV and 85 trial participants for IRV analysis. All images passed image quality checks and were processed for manual segmentation of the spleen by image analysts, followed by an over-read by trained radiologists. The spleen volume was calculated as the sum of spleen cross-sectional area across all slices multiplied by slice interval. For ARV analysis, the images were presented to the same readers in a blinded fashion at least three weeks after the initial review. For IRV analysis, images read by a primary reader were then presented to the secondary reader. The percent discrepancy for ARV and IRV were calculated as the ratio of difference between primary and secondary spleen volumes, divided by the average of the two.
The average ARV discrepancy was 0.37±0.55 % (mean±standard deviation) as shown in Fig 1a. Zero subjects had an ARV discrepancy of more than 5%. As shown in Fig 2a, majority of the cases were under an ARV discrepancy of 1%. These results show excellent consistency in approach of readers over time in comparison to 2.8±3.5% reportedby Harris et al (European Journal of Radiology, 2010). For IRV, the average discrepancy was 0.62±0.85 % as shown in Fig 1b. 1.1% of cases had an IRV discrepancy of more than 5%. As shown in Fig 2b, most of the cases were within an ARV discrepancy of 1%. These results show a high level of alignment between readers in their imaging review approach in comparison to 6.4±9.8% reportedby Harris et al (European Journal of Radiology, 2010).
The high level of reliability and repeatability seen across radiological reads suggests that a single read model is sufficient to assess imaging volumetrics-based endpoints. It is important to note that a multi-step approach was used to thoroughly train, test and monitor independent readers throughout the study duration. Readers were chosen based on high level of experience with the indication and analysis application. Reader onboarding involved an accurate overview and clear instruction on the review assessments. Multiple MRI/ CT imaging cases were utilized for reader testing and training. Since image quality can be a significant factor influencing the confidence level of a reader, these cases reflected examples of imaging artifacts expected on such trials, such as motion artifacts, low image resolution, ghosting and low contrast to noise ratio. Routine quality checks and variability assessments were done throughout the trial duration, with prompt corrective action taken to prevent inaccuracy of study results. These actions included issuing training points or re-read of cases that contained established error. Further work is necessary on assessing how variables such as spleen size, imaging artifacts and change in imaging modality affect reader variability.
No relevant conflicts of interest to declare.
Author notes
This icon denotes a clinically relevant abstract